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Abstract. This paper reports on an on-going project to investigate techniques to 
diagnose complex dynamical systems that are modeled as hybrid systems. In par- 
ticular, we examine continuous systems with embedded supervisory controllers 
that experience abrupt, partial or full failure of component devices. We cast the 
diagnosis problem as a model selection problem. To reduce die space of potential 
models under consideration, we exploit techniques from qualitative reasoning to 
conjecture an initial set of qualitative candidate diagnoses, which induce a smaller 
set of models. We refine these diagnoses using parameter estimation and model 
fitting techniques. As a motivating case study, we have examined the problem of 
diagnosing NASA’s Sprint AERCam, a small spherical robotic camera unit with 
12 thrusters that enable both linear and rotational motion. 

1 Introduction 

The objective of our project has been to investigate how to diagnose hybrid systems 
- complex dynamical systems whose behavior is modeled as a hybrid system. Hybrid 
models comprise both discrete and continuous behavior. They are typically represented 
as a sequence of piecewise continuous behaviors interleaved with discrete transitions 
(e.g., [7]). Each period of continuous behavior represents a so-called mode of the sys- 
tem. For example, in the case of NASA’s Sprint AERCam, modes might include trans- 
late JC-axis, rotate JC-axis, translate. Y-axis, etc. [ I], In the case of an Airbus fly-by-wire 
system, modes might include take-off, landing, climbing, and cruise. Mode transitions 
generally result in changes to the set of equations governing the continuous behavior of 
the system, as well as to the state vector that initializes that behavior in the new mode. 
Discrete transitions that dictate mode switching are modeled by finite state automata, 
temporal logics, switching functions, or some other transition system, while continuous 
behavior within a mode is modeled by, e.g., ordinary differential equations (ODEs) or 
differential and algebraic equations (DAEs). 

The problem we address in this paper is how to diagnose such hybrid systems. For 
the purposes of this paper, we consider the class of hybrid systems that are continuous 
systems with an embedded supervisory controller, but whose hybrid models contain no 
autonomous jumps. I.e., all nominal transitions between system modes are induced by 
a controller action, none are induced by the system state and model [7]. The class of 
systems we consider can be modeled as a composition of a set of component subsys- 
tems, each of which is itself a hybrid system. We assume that the system operation is 
being tracked by a monitoring and observer system (e.g., [ 19]) that ensures that the sys- 
tem behavior predicted by the model does not deviate significantly from file observed 


behavior in normal system operation. When observations occur outside this range, the 
behavior is deemed to be aberrant and diagnosis is initiated In this paper, we consider 
faults whose onset is abrupt, and which result in partial or complete degradation of 
component behavior. The general problem we wish to address can be stated as follows: 
Ci ve/i a hybrid model of system behavior, a history of executed controller actions, a his- 
tory of observations, including observations of aberrant behavior relative to the model, 
isolate the fault that is the cause for the aberrant behavior. Diagnosis is done online 
in conjunction with the continued operation of the system. Hence, we divide our diag- 
nosis task into two stages, initial conjecturing of candidate diagnosis and subsequent 
refinement and tracking to select the most likely diagnoses. 

In this paper we conceive the diagnosis problem as a model selection problem. The 
task is to find a mathematical model and associated parameter values that best fit the sys- 
tem data. These models dictate the components of the system that have malfunctioned, 
their mode of failure, die estimated time of failure and any additional parameters that 
further characterize the failure. To address this diagnosis problem, we propose to ex- 
ploit AI techniques for qualitative diagnosis of continuous systems to generate an initial 
set of qualitative candidate diagnoses and associated models, thus drastically reducing 
the number of potential models for our system. This is followed by parameter estima- 
tion and model fitting techniques to select the most likely mode and system parameters 
for candidate models of system behavior, given both past and subsequent observations 
of system behavior and controller actions. The main contributions of the paper are: 1) 
formulation of the hybrid diagnosis problem; 2) the exploitation of techniques for qual- 
itative diagnosis of continuous systems to reduce the diagnosis search space; and 3) the 
use of parameter estimation and data fitting techniques for evaluation and comparison 
of candidate diagnoses. 

In Section 2 we provide a brief description of NASA’s Sprint AERCam, which we 
have used as a motivating example and which we will use to illustrate certain concepts 
in this paper. In Section 3 we present a formal characterization of the class of hybrid 
systems we study and the diagnosis problem they present. In Section 4 we describe our 
approach to hybrid diagnosis and the algorithms we use to achieve hybrid diagnosis. 
Hie generation of initial candidate qualitative diagnoses is described in Section 4.1, 
and the subsequent quantitative fitting and tracking of candidate diagnoses and their 
models is described in Section 4.2. In the final two sections, we briefly discuss related 
work and summarize our contributions. 

2 Motivating Example: The AERCam 

We are using NASA’s Sprint AERCam and a simulation of system dynamics and die 
controller written in Hybrid CC (HCC) as a testbed for this work. We describe the 
dynamic model of the AERCam system briefly, a more detailed description of the model 
and simulation appear in [ 1] . 

The AERCam is a small spherical robotic camera unit, with 12 thrusters that allow 
both linear and rotational motion (Fig. 1). For the purposes of this model, we assume 
the sphere is uniform, and the fuel that powers the movement is in die center of the 
sphere. The fuel depletes as die thrusters fire. 



The Body frame of reference 
and the directions of velocities 
(u,v,w) are the components of 
the translation velocity, while 
(p,q,r) are components of the 
angular velocity. 



Three views of the AERCam, showing the thrusters, 
and showing all the thrusters together in the cube 
circumscribing the AERCam. 



Fig. 1. The AERCam axes and thrusters 


The dynamics of the AERCam are described in the AERCam body frame of refer- 
ence. The translation velocity of this frame with respect to the shuttle inertial frame of 
reference is 0. However, its orientation is the same as the orientation of the AERCam, 
thus its orientation with respect to the shuttle reference frame changes as the AERCam 
rotates (i.e., it is not an inertial frame). The twelve thrusters are aligned so that there 
are four along each major axis in the AERCam body frame. For modeling purposes, 
we assume the positions of the thrusters are on the centers of the edges of a cube cir- 
cumscribing the AERCam. Thus, for example, thrusters Ti,T*,Tz,Ti are parallel to 
the j*-axis and are used for translation along the :r-axis or rotation around the //-axis. 
I.e., firing thrusters T\ and X ' 2 results in translation along the positive :r-axis, and firing 
thrusters T\ and T\ results in a negative rotation around the y- axis. AERCam operations 
are simplified by limiting them to either translation or rotation. Thrusters are either on 
ot off, therefore, the control actions are discrete. In a normal mode of operation, only 
two thrusters are on at any time. 


2.1 AERCam dynamics 

A simplified model of file AERCam dynamics based on Newtonian laws is derived us- 
ing an inertial frame of reference fixed to the space shuttle. The AERCam position in 

this frame is defined as the triple (a\ y. :). Let V be the velocity in the AERCam body 
frame, with its vector components given by («, v, w). The frame rotates with respect 
to the inertial reference frame with velocity u> = ( p , q \ r), the angular velocity of the 
AERCam. The rotating body frame implies an additional Coriolis force acting upon the 
AERCam. We assume uniform rotational velocity since in the normal mode of opera- 



tion, the AERCam does not translate and rotate at the same time [2, pg. 130]. Similar 
equations can be derived for the rotational dynamics [1]. 

dim V)/dt — I? — 2m(V x j) Newton’s Law 
V dm/ dt + md(y)/dt =T -2m(5 x V) 

The resultant equation for each coordinate: 

du/dt = F x /m — 2{qw — vr) — (w/m) * dm/dt 
dvjdt = F,,/m — 2 (ru — ptr) — iv/m) * dm/dt 
dw/dt = Ft/m — 2 ijm — qu) — (w/m) * dm/dt 


2.2 Position Control Mode of the AERCam 

In die position control mode, die AERCam is directed to go to a specified position and 
point the camera in a particular direction. Assume the AERCam is at position A and 
directed to go to position B. In the first phase, the AERCam rotates to get one set of 
thrusters pointed towards B. These are then fired, and the AERCam cruises towards B. 
Upon reaching a position close to B, it fires thrusters to converge to B, and then rotates 
to point the camera in die desired direction. 

To facilitate the illustration of the diagnosis problem, we use a simple trapezoidal 
controller, which we explain in two dimensions. Suppose the task is to travel along 
the r-axis for some distance, then along the </- axis. Such manoeuvres are needed for 
navigating in the space shuttle. In order to do this, the AERCam fires its ;r thrusters 
for some time. Upon reaching the desired velocity, these are switched off. When the 
AERCam has reached a position close to the desired ;r position, the reverse thrusters are 
switched on, and die AERCam is brought to a halt — the velocity graph is a trapezium. 
The process is analogous for the y direction. 


3 Problem Formulation 

In this section we provide our formulation of the hybrid diagnosis problem. 

Definition 1 (Hybrid System). A hybrid system is a 5-tuple {M, X. T, X, <6), where 

- M y finite set of system modes (yu , . . . , /<*.). 

- A' C R n , continuous state variables. :r(l) is the continuous behavior at time / . 

- Ty finite set of functions {/ /t1 , . . . , f fth }, and associated parameter values 0 such 
that for each mode, //,, / M( (/.^,:r(/)) : R x R x X -> X defines the continuous 
behavior of the system in fu? 

- X, finite set of actions (<r \ , . . . ? <t/), which transition the system between modes. 

- </>, transition function which maps an action, mode and system state vector into a 
new mode and initial state vector, i.e., <j> : X x M x X -> M x A'. 

To define the hybrid diagnosis problem, we augment Definition 1 as follows. 

1 Parameter value ranges may be associated with Q. 



Definition 2 (Diagnosable Hybrid System). A diagnosable hybrid system, 
{M r X , f, X. 0 , COMPS) is a hybrid system comprised of m potentially malfunc- 
tioning components COMPS = [r\ ..... r m ) where 

- For each fi € M y includes a designation of whether each r, 6 COMPS is 
operating normally, or abnormally, i. e., ( ) . 

- We assume that transitions to fault modes are achieved by exogenous actions. 
Hence, X = X r U X e , where 

• X f is a finite set of controller actions, and 

• X e is a finite set of exogenous actions. 

- A, the controller action history, the sequence of time-indexed controller actions 
performed. 

- Xou* Q X , continuous state variables that are observable. :r o/ , 5 (/) is the observa- 
tions at time /. 

- O, the observation history, file sequence of time-indexed observations. 

For notational convenience, ftp denotes a faulty mode, i.e., a mode for which at least 
one Ci € COMPS is ab(ci) in ftp. 9p denotes the parameters associated with f flF . 

In file case of the AERCam example, the potentially malfunctioning components are 
the 12 thrusters, and a mode ji includes the behavior mode (e.g., translate-x, translate- 
y, rotate-x, etc.) and (~> )ab(Ti). 1 = 1, . . . . 12, for each thruster. The continuous state 
vector includes the :r, y, z position of the AERCam, velocity and acceleration. The 
parameter values, 6 associated with each f it are the percentage degradation of each of 
the thrusters. 

Definition 3 (Model)* A model, M od of a diagnosable hybrid systems is a time-indexed 
mode sequence and associated parameter values ([/t j ..... //„,]. [9\ ..... 9 tn ]) 

Notice that each model of the system, (/*, 0) induces a corresponding time-indexed 
piecewise continuous sequence of functions [/ /t) , . . . . ] dictating system behavior. 

In this paper we make several simplifying assumptions regarding our diagnosis task. 
In particular, we make a single-time fault assumption. We assume that our systems do 
not experience multiple sequential faults. Further, we assume that faults are abrupt, 
resulting in partial or full degradation of component behavior. We cast the hybrid diag- 
nosis task as the problem of finding the most likely model for the observation history, 
P(Mod | <f>). I.e, the sequence of modes and parameter values (/*. 9) that best fit the 
observations overtime. Under normal operation, the model of the system Mod nornu j is 
fully dictated by the sequence of controller actions -4 and the nominal parameter values, 
9 . Once again, we assume that the system operation is being tracked by a monitoring and 
observer system (e.g., [ 19]) that ensures that the system behavior predicted by the model 
does not deviate significantly from the observed behavior in normal system operation. 
When observations occur outside this range, the behavior is deemed to be aberrant and 
diagnosis is initiated. Given a diagnosable hybrid system (M, X . T, X , <?, COMPS), 
a controller action history, A and a history of observations, O which includes observa- 
tions of aberrant behavior, file hybrid diagnosis task is to determine what components 
are faulty, what fault mode caused die aberrant behavior, when it occurred, and what the 
values of the parameters associated with the fault mode are. In the AERCam system, a 
diagnosis might be that thruster Tj experienced a blockage fault of 50%, at time / 



Once Mod nnrmti i has been rejected, we must find a new most likely model from 
among the potentially exponential (in C OM PS) number of mode sequences, occurring 
within a large but bounded time range. We propose to exploit previous research on 
temporal causal graphs for qualitative diagnosis of continuous systems [ 1 8], to compute 
a set of candidate qualitative diagnoses that are consistent with our system, in order to 
identify a preliminary subset of candidate models, whose likelihood can be estimated. 

Definition 4 (D-tuple), A D-tuple is a 4-tuple (C,fipJp,0p) 9 where ftp is a fault 
mode, Ip is the time the fault mode commenced. Or is the parameter values associated 
with the fault mode behavior, and C is the set of failed (abnormal) components in ft p. 

Definition 5 (Candidate Qualitative Diagnosis). Given a diagnosable hybrid system 
with model Mod = (ft, 0) an action history A, and a history of observations, O which 
includes observations of aberrant behavior, D-tuple (C. ftpAr , Op) is a candidate qual- 
itative diagnosis iff there exists a range of parameter values Op = [9pO n ], and time 
range / r = [// , /„] such that the occurrence of fault mode ft p with parameter values Op 
in time range Ip is consistent with O, .4 and Mod. 

Hence, a candidate qualitative diagnosis stipulates a fault mode, including one or 
more faulty components. It also stipulates a lower and upper bound, [//, /•„], on the time 
the fault mode occurred. This range generally corresponds to the start times of the con- 
troller induced modes preceding and following the fault, or up to the point tire fault was 
detected. This candidate diagnosis induces an associated candidate model , Mode — 
(D* i IH, /* f , ..... /<!„], ,0i, Or, 0 ' i+ ...... O corresponding to M od 

with the fault mode ft p and Op inserted at i p. Every subsequent mode, , . . . , ft m , 
has nb(Ci), Ci € C enforced, and every subsequent set of parameters has the param- 
eters associated with faulty components C enforced. Computing candidate qualitative 
diagnoses is discussed in Section 4.1 . 

Since each candidate qualitative diagnosis only conjectured ranges for the time of 
the fault mode, / r and parameter values associated with the fault mode, Op, the asso- 
ciated candidate models are underconstrained. In Section 4.2, we discuss methods for 
estimating unique values for t p and Op and for estimating a posterior probability for 
each of tire candidate models. Mode , given O. 

Definition 6 (Candidate Diagnosis). Given a diagnosable hybrid system, a history of 
controller actions A, and a history of observations O , D-tuple (C, ftp, Ip, Op) with 
associated model Mode is a candidate diagnosis for the hybrid system, iff P(M ode I 
O) > o , for defined threshold value a € [0, 1]. 


4 Diagnosing Hybrid Systems 

In this section we discuss one method for computing hybrid diagnoses. In Section 4.1 
we discuss a technique for generating candidate qualitative diagnoses, and their associ- 
ated candidate models. In Section 4.2 we discuss techniques for model fitting and for 
model (and hence diagnosis) comparison. In particular we discuss techniques for esti- 
mating the parameters of the candidate models, and the likelihood of file models, and for 


continued monitoring and refinement of die candidate models as the system continues 
to operate and observations continue to be made. 

We illustrate these techniques with die following simple AERCam example. Con- 
sider the scenario depicted in Fig. 2. In the first accelerate phase, the AERCam is being 
powered by thrusters T 1 and T2. Assume that at some point in this phase, a sudden leak 
in the T2 thruster causes an abrupt change in its output. As a consequence, the AER- 
Cam starts veering to the right of die desired trajectory, as illustrated by the left-most 
dotted lines in Fig. 2. (The other dotted lines represent other potential candidate diag- 
noses consistent with the point of detection of the failure.) Soon after this occurs, the 
supervisory controller commands the AERCam to turn off Thrusters T 1 and T2 with 
the objective of getting the AERCam to cruise in a straight line. In the faulty situation, 
the AERCam has some residual angular velocity about die z-axis, so it continues to 
rotate in the cruise mode. Then the controller turns on thrusters T3 and T 4, to decel- 
erate the AERCam with the objective of bringing it to a halt. Again, this objective is 
not entirely achieved in die die faulty situation. Next, thrusters To and T6 are switched 
on, to move the AERCam in the y direction. However, since the AERCam is not in the 
desired orientation after the failure, the position error due to faulty thruster T2 accumu- 
lates causing a greater and greater deviation from the desired trajectory of the system. 
The position of the AERCam is being continuously sensed, filtered for noise and mon- 
itored. At some point within die y translation the trajectory exceeds the error bound, 
i.e., P(Mod norma i < a) and is flagged by the monitoring system as aberrant relative 
to Mod n0 r m ab At this point, the diagnosis task begins. 
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Fig. 2. Possible fruit trajectories of AERCam ( simplified for illustration purposes). 


4*1 Qualitative Candidate Generation 

Given the current system model Mod — (/*, 0) (commonly Mod normi j), a history of 
controller actions „4, and a history of observations including one or more observa- 


tions of aberrant behavior, we wish to generate a set of candidate qualitative diagnoses 
{C. ftp. Ip, Op), and associated candidate models as described in Definition 5. To do 
so, we extend techniques for generating qualitative diagnoses of continuous dynamic 
systems to deal with hybrid systems with multiple modes. The model and propagation 
mechanism, as applied to continuous systems diagnosis, is described in [18]. 

In the case of our AERCam example, the action history ^4 is [(on(Tl), on(T2)), 
(ofiRXl), offlT2)) ? (on(X3), on(T4)), (off(T3) ? off(T4) ? on(T3), on(TG)), (off(Xo), 
off[TG))]; the model, Mod nornu u is the time-indexed sequence [(accelerate jr, ^ab(T 1- 
T12), 0 ), (cruise^r. ^ab(Tl~TV2), 0),(deceleralejr 7 -iah(Tl-T12), 0), (accelerate -if, 
-^ah(Tl - T12), 0) y {cruise4i. -iab(Tl - T12), 0)], where 0 is a vector of length 12 all 
of whose entries are 0 (percent degradation in thrusters). 

To generate candidate qualitative diagnoses we construct an abstract model of the 
dynamic system behavior, Mod nwmfi i as a temporal causal graph. A part of the tem- 
poral causal graph for the AERCam dynamics is shown in Fig. 3. The graph expresses 
directed cause-effect relations between component parameters and die system state vari- 
ables. Links between variables are labeled as: (i) + 1, implying direct proportionality, 

(ii) - 1, implying inverse proportionality, and (iii) /, implying an integrating relation. 

An integrating relation introduces a temporal delay in that a change on the cause side of 
the relation affects the derivative of the variable on the effect side. This adds temporal 
characteristics to the relations between variables. Some edges are labeled by variables, 
implying the sign of die variable in the particular situation defines die nature of the rela- 
tionship. The candidate generation algorithm is invoked for every initial instance of an 



Fig. 3. A subset of die temporal causal graph showing the relations between Thrusters Tl — T8 
and the x and y positions of the AERCam. 


aberrant observation. The aberrant observation plus the controller action history .4 are 
input to a backward propagation algori thm that operates on the temporal causal graph. 



The algorithm operates backwards from the last mode in the mode sequence of M od: 

Step 1 For the current mode, extract the corresponding temporal causal graph model, 
and apply die Identify Possible Faults algorithm. Details of this algorithm are presented 
in [18], but the key aspect of this algorithm is to propagate the aberrant observation ex- 
pressed as a ± value, backward depth-first through the graph. For example, given that 
the ^-position of the AERCam has deviated - (i.e., below normal), backward prop- 
agation implies d(y)/dt is and so on, till we get Tf and Tf y implying thrusters 
To and TG are possibly faulty with decreased thrust performance. Propagation along a 
path can terminate if conflicting assignments are made to a node. The goal is to system- 
atically propagate observed discrepancies backward to identify all possible candidate 
hypotheses that are consistent with die observations. In our example, the component 
parameters, COMPS = {IT, . . . , T12) form the space of candidate faults. 

Step 2 Repeat Step 1 for every mode in the mode sequence, to i . The system model 
needs to be substituted as the algorithm traverses the mode sequence backwards. There- 
fore, back propagation will be performed on a different temporal causal graph for each 
mode in the controller history 2 . 

The output of this step is a set of qualitative diagnoses (C, f irJ p. #r}, each with 
an associated candidate model, as described in Section 3. Returning to our AERCam 
example, three qualitative candidate diagnoses are generated. The first candidate diag- 
nosis is that T2 failed in the x acceleration phase. The time of the fault mode transition 
is {/ 1 , h\, and the parameters associated with the failure - the percentage degradation 
of the component is in the range [0, 100], So the first candidate qualitative diagnosis 
is { T2 , (accelerate Jt. ab(T2), ^ab(T\.TZ - TT2), Op). [/ 1 , /•>], [0 ? 100]). The candi- 
date model simply has (accelerate ab(T2), ~^ab(Tl). ->ab(T3-T 12)) insertedafter 
the mode (accelerate sv. -» ab(Tl - T12)), and ab(T2) enforced in every subsequent 
mode. The second candidate qualitative diagnosis is that T4 failed in the deceleration 
phase of x translation, i.e., (T 4, (decelerate jr, ab{T 4) , ->ab(Tl - T3, To - T12) , 0 p). 
p 3 , /. t], [0, 100]). The third candidate is that TG foiled during y acceleration, i.e., (TG, 
( aecelcrate-y , ab(T6 ), ~*ab(Tl - Tb,T7 - T12),0p). [f.i, Ip], [0, 100]), where tp is 
the time of detection of the aberrant behavior. In each case Op is a vector of length 12 
with every entry equal to 0 (percentage degradation), except the entries corresponding 
to the faulty thrusters, C which will have the range [0, 100]. 


4.2 Model Fitting and Comparison 

Given the candidate qualitative diagnoses and their associated candidate models, the 
next phase of the diagnosis process is quantitative refinement of the qualitative can- 
didate diagnoses and their associated models through parameter estimation and data 
fitting, followed by tracking of the fit of subsequent observations to the candidate mod- 
els. The goal is to at least provide a probabilistic ranking of foe plausible candidates, if 
not a unique model (and hence diagnosis). 


2 We may cut off back-propagation along foe mode sequence beyond a tune limit. 


As observed in the previous section, die model associated with the candidate qualita- 
tive diagnosis, Mode is underconstrained. Both die time of the fault mode occurrence, 
/ r and die parameters associated with the faulty behavior Op are represented as ranges 
and must be estimated. Further, the candidate qualitative diagnoses were generated from 
initial observations of aberrant behavior, and their consistency can be further evaluated 
by monitoring the qualitative transients associated with each candidate. The refinement 
process is performed by a set of trackers [21], one for each candidate diagnosis and 
associated model. Each tracker comprises both a qualitative transient analysis compo- 
nent and a quantitative model estimation , component. The two components operate in 
parallel as described below. 

Qualitative Transient Analysis 

The qualitative transient analysis component performs a further qualitative analysis of 
the consistency of candidate qualitative diagnoses based on monitoring of higher-order 
transients whose manifestation is seen over a longer period of time. If the transients 
of a candidate qualitative diagnosis do not remain consistent with subsequent observa- 
tions, fae candidate diagnosis will be eliminated and the model estimation component 
informed. The technique we employ is derived from techniques for qualitative monitor- 
ing of continuous systems. Details of the algorithm appear in [18]. 

Model Estimation 

The purpose of the model estimation component is to perform quantitative model fit- 
ting, i.e., to provide a quantitative estimate of the parameters of the models and to assign 
a probability to each of the candidate models (and hence candidate diagnoses), given 
the noisy observed data. In particular, given a candidate model, M ode the model es- 
timation component uses parameter estimation techniques to estimate both the time at 
which the failure occurred, l f, and the value for the parameters, Op, associated with the 
conjectured failure mode. In this paper we discuss two alternate approaches to our time 
and parameter estimation problem. The first approach is based on Expectation Maxi- 
mization (EM) (e.g., [8]), an iterative technique that converges to an optimal value for 
If and Op simultaneously. The second approach we consider employs General Likeli- 
hood Ratio (GLR) techniques (e.g., [5]) to estimate the time of failure / p, and then uses 
the observations obtained after the failure to estimate the fault parameters, 6 p, by a least 
squares method. As described in Section 3, the outcome of both approaches is a unique 
value for Ip and Op and a measure of die likelihood of Mode given the observations. 
The proposed approaches to model fitting have trade-offs and we are currently assess- 
ing the efficacy of these and other alternative approaches through experimentation. 

EM-Based Approach The Expectation Maximization (EM) algorithm (e.g., [8]) pro- 
vides a technique for finding the maximum-likelihood estimate of the parameters of an 
underlying distribution from a given set of data, when that data is incomplete or has 
missing values. The parameter estimation problem we address in this paper is a vari- 
ant of the motion segmentation problem described in [24]. Here, we define die basic 
algorithm and the intuition behind our approach. (See [8] for more details.) 

The time of failure, I p = [//, / „] of our candidate qualitative diagnosisdictates the 
mode in which the failure is conjectured to have occurred. Let us call this mode // *. 
The behavior of our hybrid system in mode /<, is described by die continuous function 


f /li9 with known parameters At some (to be estimated) time point ij? within the 
predicted time period of /*,, we have conjectured that die system experienced a fault 
which transitions it into mode ftp. The behavior of our hybrid system in mode ftp is 
described by the continuous function / /tP , with unknown parameters, Op. We also have 
a set of data points O = [x^ (//),... ,**/,*(/«)] Q O, which either reflect the behavior 
of the system under f /ti or under f ftr . 

Given all this information, our task is to find l) values for parameters Op , and 2) an 
assignment of the data points O to either }tj or ftp so that we maximize the fit of die 
data to the two functions. The assignment of data points will in turn tell us the value 
of Ip. EM provides an iterative algorithm which converges to provide a maximum- 
likelihood estimate for Op given O , i.e., roughly we are calculating the likelihood of 6, 
L(0) = P(0 | Op, Mode). 

The basic EM algorithm comprises two steps: an Expectation Step (E Step), and a 
Maximization Step (M Step) [24]: 

• Select an initial (random) value for Op. 

• Iterate until convergence: 

- E Step: assign data points to either f fli (9i ) or which ever fits it best. 

- M Step: re-estimate Op using the data points assigned to f fl P (Op). 

The assignment of data points to //; and ftp provides an estimate for Ip, We may 
exploit the fact that the assignment of data points is temporally correlated with all points 
before I/ belonging to //<, and all points after If belonging to ft /. We may also exploit 
the fact that data points at the beginning of the interval will belong to /< ;, while those 
at the end will belong to ftp. These task-specific qualities help our algorithm converge 
more quickly. 

EM provides a rich algorithm for maximum-likelihood parameter estimation when 
we don’t know the value of/p. In some hybrid diagnosis applications, depending upon 
the sensors in our system, and the level of noise in die sensors, we may be able to de- 
velop monitoring techniques that will help isolate a reasonable value for Ip, minimizing 
the need for iteration in EM. In such cases, an alternative to the EM-based approach is 
to first estimate Ip using the Generalized Likelihood Ratio (GLR) method [5], followed 
by parameter estimation of Op. 

GLR + Least Squares Approach Here, we divide the parameter estimation problem 
into two parts: (i) estimate the time of failure, Ip, using die Generalized Likelihood 
Ratio (GLR) method, and (h) apply a standard least squares method for parameter esti- 
mation. The intuition is that solving the problem in two parts simplifies the estimation 
process, and very likely mitigates die numerical convergence problems that arise in 
dealing with complex higher-order models. 

The GLR method for detecting abrupt changes in continuous signals is described 
in [5]. We have applied it to fault transients analysis in complex fluid thermal systems 
[16]. Here we provide an overview of the method for die single parameter case. Assume 
that the signal under scrutiny is a time-indexed sequence of random variables y(k), with 
probability density function, po, (y) in desired mode /./,*, and po r (y) in fault mode ftp. 
y is either contained in * 0 hs or computed from We assume that a fault causes an 
abrupt change in p(Jfc). In the case of the AERCam, y captures the difference between 
the observed and expected values of the, e.g., acceleration, as predicted by the model. 


The central quantity in the change detection algorithm is the cumulative sum of the 
log-likelihood ratio for a window of observations between times in and n, 


SU»f) = £ In 

k=m 


Po,[y(k)) ' 


Again, this ratio is a function of two unknowns: Ip and Op. The common statistical 
solution is to use maximum likelihood estimates for these two parameters, resulting in 
a double maximization: 


fin = max sup S;;,(6»r). 

I <m<n ( ) r 

If we assume that probability density functions, po ; (y) and po r (y) are Gaussian, 
then g n reduces to: 
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where u and <r‘f are the mean and variance for p 0/ (y), respectively. 

When processing a sequence of samples, the point of abrupt change, / p, is computed 
from mln{n : y n > //.}, where h is an appropriately defined threshold. Hence, the 
smaller the value of h, the more sensitive the function to change, and unfortunately to 
false alarms, so h must be set carefully. 

Once / p is estimated, data points observed after / p, are used to estimate the parame- 
ter, Of? for a hypothesized fault using regression techniques. In the case of the AERCam, 
the position vector of the AERCam is modeled as a set of quadratic functions in terms 
of the thruster force. These functions contain one unknown, 0 p, the parameter that cor- 
responds to the degree of degradation in the faulty thruster. The least squares estimate 
for Op is computed, and the the measure of fit of the candidate model to the observed 
data used to estimated the probability of the candidate model (and hence, diagnosis). 

Model Comparison 

From the model estimation component, each tracker computes the likelihood of its 
model Mode, and hence of the associated candidate diagnosis {C.//p. / p. Op), as a 
measure of fit of the observations to the model. As new data are observed. Op 

and /p, are adjusted and P{Modc | computed. If the likelihood of Mode 

falls below a predefined acceptable likelihood threshold, a, then its tracker is termi- 
nated, and the associated candidate diagnosis (C ? /tp ? Ip. Op) removed from the list of 
candidate diagnoses. Tracking terminates when a unique diagnosis is obtained, or when 
the diagnoses are sufficiently discriminated to determine suitable controller actions. 

5 Related Work 

The specific problem of diagnosing hybrid systems has received little attention to date, 
although there is much related woik. Within the AI community, there has been a great 


f/n = 


—7 max - 

2 oj i <m<» n — in - hi 



deal of research on diagnosing static systems (e.g., [14]), while much less on diag- 
nosing discrete dynamical systems (e.g., [17,25]), and qualitative representations of 
continuous systems (e.g.,, [18]). Within the FDI community, the largest proportion of 
research has focused on diagnosing continuous systems (e.g., [13, 1 1]). The most com- 
mon model-based approaches use observer schemes(e.g., [ 12, 20]), where the goal is to 
design residual generators based on observed discrepancies, such that individual resid- 
uals are sensitive to a particular subset of faults. There is also complementary work by 
Basseville [4], using model-based statistical processing techniques for early fault de- 
tection and residual identification. [18] perform residual generation and analysis task in 
a qualitative framework to address some of die computational issues that arise in han- 
dling the complex dynamics that occur in fault transients, with some preliminary work 
on building multiple observers for hybrid systems [19]. Diagnosis of discrete-event sys- 
tems has also been studied within the FDI community (e.g, [22, 15]). Fabre et al. [ 10] 
have employed stochastic Petri nets based on a Hidden Markov Model probabilistic 
scheme for alarm analysis. Unfortunately, it is not clear how to systematically derive 
such representations from die physical system models that we work with. 


6 Summary 

In this paper we addressed the problem of diagnosing hybrid systems. The main con- 
tributions of the paper are 1) formulation of die hybrid diagnosis problem as model 
selection; 2) the exploitation of techniques for qualitative diagnosis of continuous sys- 
tems to reduce the diagnosis search space; and 3) the use of parameter estimation and 
data fitting techniques for evaluation and comparison of candidate diagnoses. This work 
continues with experimental analysis of the proposed techniques, and a more formal 
characterization of our approach in terms of Bayesian model selection. 
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